Skip to content

Latest commit

 

History

History

classification

InternViT-6B for Image Classification

This folder contains the implementation of the InternViT-6B for image classification.

The codebase for this part is derived from InternImage, with some code references to EVA and DINOv2. Thanks for their great work.

InternViT-6B follows the structure of vanilla ViT, and its hyperparameters are listed in the table below.

image

🛠️ Installation

See INSTALLATION.md

📦 Data Preparation

Please prepare the dataset according to your needs.

First, please prepare the ImageNet-1K, ImageNet-A, ImageNet-R, ImageNetV2, and ImageNet-Sketch datasets following the directory structure outlined below.

$ tree data
data
├── imagenet-1k
│         ├── train
          │    ├── n01498041
          │    └── ...
│         └── val
│              ├── ILSVRC2012_val_00000001.JPEG
│              └── ...
├── imagenet-a
│         ├── n01498041
│         └── ...
├── imagenet-r
│         ├── n01443537
│         └── ...
├── imagenet-sketch
│         ├── n01440764
│         └── ...
└── imagenetv2
    └── ImageNetV2-matched-frequency

Then, unzip the train.txt.zip and val.txt.zip in meta_data/.

cd meta_data/
unzip train.txt.zip
unzip val.txt.zip

📦 Model Preparation

model name type download size
InternViT-6B-224px pytorch 🤗 HF link 12 GB
InternViT-6B-224px-head pytorch 🤗 HF link 25.7 MB

Please download the above model weights and place them in the pretrained/ folder.

cd pretrained/
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px.pth
wget https://huggingface.co/OpenGVLab/InternVL/resolve/main/intern_vit_6b_224px_head.pth

The directory structure is:

pretrained
├── intern_vit_6b_224px_head.pth
└── intern_vit_6b_224px.pth

🔍 Linear Probing on ImageNet-1K

Note, please install apex before training (see installation guide above for details).

To train a linear classifier for InternViT-6B on ImageNet with 8 GPUs, run:

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --cfg configs/intern_vit_6b_1k_224.yaml
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224.yaml --launcher slurm

📊 Evaluation

model name IN-1K IN-ReaL IN-V2 IN-A IN-R IN-Sketch download
intern_vit_6b_1k_224.yaml 88.2 90.4 79.9 77.5 89.8 69.1 ckpt | log
Evaluate InternViT-6B on ImageNet-1K val with 8 GPUs (click to expand).
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 88.230 Acc@5 98.474
Accuracy of the network on the 50000 test images: 88.2%
Evaluate InternViT-6B on ImageNet-ReaL with 1 GPU (click to expand).

Note: ImageNet-ReaL now only supports single-GPU testing.

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_real.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=1 GPUS_PER_NODE=1 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_real.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

* ReaL Acc@1 90.437 Acc@5 98.567 loss 0.605
ReaL Accuracy of the network on the 50000 test images: 90.4%
Evaluate InternViT-6B on ImageNetV2 with 8 GPUs (click to expand).
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenetv2.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenetv2.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 79.940 Acc@5 95.340
Accuracy of the network on the 10000 test images: 79.9%
Evaluate InternViT-6B on ImageNet-A with 8 GPUs (click to expand).
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_a.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_a.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 77.479 Acc@5 92.737
Accuracy of the network on the 7500 test images: 77.5%
Evaluate InternViT-6B on ImageNet-R with 8 GPUs (click to expand).
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_r.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_r.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 89.777 Acc@5 97.023
Accuracy of the network on the 30000 test images: 89.8%
Evaluate InternViT-6B on ImageNet-Sketch with 8 GPUs (click to expand).
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py --eval \
    --cfg configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml --resume pretrained/intern_vit_6b_224px_head.pth
# or manage jobs with slurm
GPUS=8 sh train_in1k.sh <partition> <job-name> configs/intern_vit_6b_1k_224_test_imagenet_sketch.yaml --eval \
    --resume pretrained/intern_vit_6b_224px_head.pth --launcher slurm

Expected results:

 * Acc@1 69.117 Acc@5 88.341
Accuracy of the network on the 50889 test images: 69.1%